Toward Computational Processing of Less Resourced Languages: Primarily Experiments for Moroccan Amazigh Language
نویسندگان
چکیده
The world is undergoing a huge transformation from industrial economies into an informa‐ tion economy, in which the indices of value are shifting from material to non-material re‐ sources. This transformation has been rightly described as a revolution that is accompanied by considerable dangers for the future and the survival of many languages and their associ‐ ated cultures. The last years have seen a growing tendency in investigating applying lan‐ guage processing methods to other languages than English. However, most of tools and methods' development on language processing has so far concentrated on a fairly small and limited number of languages, mainly European and East-Asian languages.
منابع مشابه
Natural Language Processing for Amazigh Language: Challenges and Future Directions
Amazigh language, as one of the indo-European languages, poses many challenges on natural language processing. The writing system, the morphology based on unique word formation process of roots and patterns, and the lack of linguistic corpora make computational approaches to Amazigh language challenging. In this paper, we give an overview of the current state of the art in Natural Language Proc...
متن کاملAn Arabic-Moroccan Darija Code-Switched Corpus
In multilingual communities, speakers often switch between languages or dialects within the same context. This phenomenon is called code-switching. It can be observed, e.g., in the Arab world, where Modern Standard Arabic and Dialectal Arabic coexist. Recently, the computational treatment of code-switching has received attention. Just as other natural language processing tasks, this task requir...
متن کاملAmazigh Verb Conjugator
With the aim of preserving the Amazigh heritage from being threatened with disappearance, it seems suitable to provide Amazigh with required resources to confront the stakes of access to the domain of New Information and Communication Technologies (ICT). In this context and in the perspective to build linguistic resources and natural language processing tools for this language, we have undertak...
متن کاملA Comparison of Three Machine Learning Methods for Amazigh POS Tagging
Part of speech tagging (POS tagging) has a crucial role in different fields of natural language processing (NLP) including Speech Recognition, Natural Language Parsing, Information Retrieval and Multi Words Term Extraction. This paper describes a set of experiments involving the application of three state-of the-art part-of-speech taggers to Amazigh texts, using a tagset of 28 tags. The taggers...
متن کاملMorphological analysis for less-resourced languages: Maximum Affix Overlap applied to Zulu
The paper describes a collaboration approach in progress for morphological analysis of less-resourced languages. The approach is based on firstly, a language-independent machine learning algorithm, Maximum Affix Overlap, that generates candidates for morphological decompositions from an initial set of language-specific training data; and secondly, language-dependent post-processing using langua...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012